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Abstract In the past decades, exactly recovering the intrin- 
sic data structure from corrupted observations, which is known 
as robust principal component analysis (RPCA), has attracted 
tremendous interests and found many applications in com- 
puter vision. Recently, this problem has been formulated as 
recovering a low-rank component and a sparse component 
from the observed data matrix. It is proved that under some 
suitable conditions, this problem can be exactly solved by 
principal component pursuit (PCP), i.e., minimizing a com- 
bination of nuclear norm and li norm. Most of the existing 
methods for solving PCP require singular value decompo- 
sitions (SVD) of the data matrix, resulting in a high com- 
putational complexity, hence preventing the applications of 
RPCA to very large scale computer vision problems. In this 
paper, we propose a novel algorithm, called h filtering, for 
exactly solving PCP with an 0(r^ (m+n)) complexity, where 
m X n is the size of data matrix and r is the rank of the ma- 
trix to recover, which is supposed to be much smaller than 
m and n. Moreover, li filtering is highly parallelizable. It is 
the first algorithm that can exactly solve a nuclear norm min- 
imization problem in linear time (with respect to the data 
size). Experiments on both synthetic data and real applica- 
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tions testify to the great advantage of h filtering in speed 
over state-of-the-art algorithms. 
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1 Introduction 

Robustly recovering the intrinsic low-dimensional structure 
of high-dimensional visual data, which is known as robust 
principal component analysis (RPCA), plays a fundamental 
role in various computer vision tasks, such as face image 
alignment and processing, video denoising, structure from 
motion, background modeling, photometric stereo and tex- 
ture representation (see e.g., Wright et al| ( |2009| ), Ji et al 



( |20Tq1 ), |De la Torre and Black] ( [2003] ), [Peng et al| ( |20T0l ), 
Wu et al| ( [2QTQ| , and |Zhang etaTl ( |2Q12| ), to name just a few). 



Through the years, a large number of approaches have been 
proposed for solving this problem. The representative works 
include |De la Torre and Bl acF('200T),"Nie et ar('20Tr),'Aanes| 
[eFall ( |2QQ2l ), [Baccini et al, ( .1996j , .Ke and Kanade. (2QQ5| , 
[Skocaj et al| ( [2007] ), and |Storer et all ( [2009] ). The main limita- 
tion of above mentioned methods is that there is no theoret- 
ical guarantee for their performance. Recently, the advances 
in compressive sensing have led to increasingly interests in 
considering RPCA as a problem of exactly recovering a low- 
rank matrix Lq from corrupted observations M = Lq + Sq, 
where Sq is known to be sparse ( [Wright etal ( 2009 ), Candes 
[et al| ( |20lT] )). Its mathematical model is as follows: 



minrank(L) + A||S| 



s.t. M = L + S, 



(1) 



where || • ||^q is the Iq norm of a matrix, i.e., the number of 
nonzero entries in the matrix. 
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Unfortunately, problem ([T]) is known to be NP-hard. So 
Candes et al ( 2Q11| ) proposed using principal component 
pursuit (PCP) to solve ([T]), which is to replace the rank func- 
tion and Iq norm with the nuclear norm (which is the sum of 
the singular values of a matrix, denoted as || • || *) and h norm 
(which is the sum of the absolute values of the entries), re- 
spectively. More specifically, PCP is to solve the following 
convex problem instead: 



min IILII 



•AIISI 



s.t. M = L + S. 



(2) 



They also rigorously proved that under fairly general condi- 
tions and A = 1/ Y^max(m, n), PCP can exactly recover the 
low-rank matrix Lq (namely the underlying low-dimensional 
structure) with an overwhelming probability, i.e., the differ- 
ence of the probability from 1 decays exponentially when 
the matrix size increases. This theoretical analysis makes 
PCP distinct from previous methods for RPCA. 

All the existing algorithms for RPCA need to compute 
either SVD or matrix-matrix multiplications on the whole 
data matrix. So their computation complexities are all at 
least quadratic w.r.t. the data size, preventing the applica- 
tions of RPCA to large-scale problems when the time is crit- 
ical. In this paper, we address the large-scale RPCA prob- 
lem and propose a truly linear cost method to solve the PCP 
model ^ when the data size is very large while the target 
rank is relatively small. Such kind of data is ubiquitous in 
computer vision. 



1 . 1 Main Idea 
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Fig. 1 Illustration of the proposed h filtering method. A large ob- 
served data matrix M (a) is the sum of a low-rank matrix Lq (b) and 
a sparse matrix So (c). The method first recovers a seed matrix (a sub- 
matrix of Lq) (e). Then the submatrices (f) and (g) can be 
recovered by column and row filtering, respectively, where U and 
are the column space and row space of L^, respectively. Then the com- 
plement matrix (h) can be represented by L*, and L^. Finally, 
we obtain the computed low-rank matrix L* (d), which is identical to 
Lo with an overwhelming probability. 



linear time cost, the proposed algorithm is also highly paral- 
lel: the columns of and the rows of can be recovered 
fully independently. We also prove that under suitable con- 
ditions, our method can exactly recover the underling low- 
rank matrix Lq with an overwhelming probability. To our 
best knowledge, this is the first algorithm that can exactly 
solve a nuclear norm minimization problem in linear time. 



Our algorithm fully utilizes the properties of low-rankness. 
The main idea is to apply PCP to a randomly selected sub- 
matrix of the original noisy matrix and compute a low rank 
submatrix. Using this low rank submatrix, the true low rank 
matrix can be estimated efficiently, where the low rank sub- 
matrix is part of it. 

Specifically, our method consists of two steps (illustrated 
in Figure[T]). The first step is to recover a submatri}|^L^ (Fig- 
ure [T](e)) of Lq. We call this submatrix the seed matrix be- 
cause all other entries of Lq can be further calculated by this 
submatrix. The second step is to use the seed matrix to re- 
cover two submatrices and (Figures [T](f)-(g)), which 
are on the same rows and columns as in Lq, respectively. 
They are recovered by minimizing the li distance from the 
subspaces spanned by the columns and rows of L*, respec- 
tively. Hence we call this step li filtering. The remaining 
part Lg (Figure [l] (h)) of Lq can be represented by L^, 



and L^, using the generalized Nystrom method (Wang et al 



( |2QQ9| )). As analyzed in Section [34] our method is of linear 
cost with respect to the data size. Besides the advantage of 

^ Note that the "submatrix" here does not necessarily mean that we 
have to choose consecutive rows and columns from M. 



2 Previous Works 

In this section, we review some previous algorithms for solv- 
ing PCP. The existing solvers can be roughly divided into 
three categories: classic convex optimization, factorization 
and compressed optimization. 

For small sized problems, PCP can be reformulated as 
a semidefinite program and then be solved by standard in- 
terior point methods. However, this type of methods can- 
not handle even moderate scale matrices due to their O(n^) 
complexity in each iteration. So people turned to first-order 
algorithms, such as the dual method (I Ganesh et al ( |2009| )), 
the accelerated proximal gradient (APG) method ( [Ganesh 
2QQ9| )) and the alternating direction method (ADM) 



et al 



( |Lin etal| ( |2QQ9l )), amonR which ADM is the most efficient. 



All these methods require solving the following kind of sub- 
problem in each iteration 



min7^||A|| 



A- W 



(3) 
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where || • ||f is the Frobenious norm. Cai et al ( 2Q1Q| proved 
that the above problem has a closed form solution 



A = U5^(5])V^, 



(4) 



where U5]V^ is the singular value decomposition of W 
and Srj{x) = sgn(x) max(|x| — 7^,0) is the soft shrink- 
age operator. Therefore, these methods all require comput- 
ing SVDs for some matrices, resulting in 0{mn min(m, n)) 
complexity, where m x n is the matrix size. 

As the most expensive computational task required by 
solving ([2]) is to perform SVD,I Lin et al|pQ09) ) adopted par- 
tial S VD ( |Larsen| ( |1998| )) to reduce the complexity at each 
iteration to O(rmn), where r is the target rank. However, 
such a complexity is still too high for very large data sets. 
Drineas etaT| ( |2QQ6| ) developed a fast Monte Carlo algorithm, 
named linear time SVD (LTSVD), which can be used for 
solving SVDs approximately (also see Halko et al ( 2011| )). 
The main drawback of LTSVD is that it is less accurate than 
the standard SVD as it uses random sampling. So the whole 
algorithm needs more iterations to achieve the same accu- 
racy. As a consequence, the speed performance of LTSVD 
quickly deteriorates when the target rank increases (see Fig- 
ure [2]). Actually, even adopting LTSVD the whole algorithm 
is still quadratic w.r.t. the data size because it still requires 
matrix-matrix multiplication in each iteration. 

To address the scalability issue of solving large-scale 
PCP problems, |Shen et al| ( |2Q11| ) proposed a factorization 
based method, named low-rank matrix fitting (LMaFit). This 
approach represents the low-rank matrix as a product of two 
matrices and then minimizes over the two matrices alter- 
nately. Although they do not require nuclear norm mini- 
mization (hence the SVDs), the convergence of the proposed 
algorithm is not guaranteed as the corresponding problem is 
non-convex. Moreover, both the matrix-matrix multiplica- 
tion and the QR decomposition based rank estimation tech- 
nique require 0{rmn) complexity. So this method does not 
essentially reduce the complexity. 



Inspired by compressed optimization, Mu et al ( 2011 ) 
proposed reducing the problem scale by random projection 
(RP). However, this method is highly unstable - different 
random projections may lead to radically different results. 
Moreover, the need to introduce additional constraint to the 
problem slows down the convergence. And actually, the com- 
plexity of this method is also 0{pmn), where p x m is the 
size of the random projection matrix and p > r. So this 
method is still not of linear complexity with respect to the 
matrix size. 



is to recover Lq from M. What our approach differs from 
traditional ones is that the underlying low-rank matrix Lq 
is reconstructed from a seed matrix. As explained in Sec- 
tion [LT} our 1 1 filtering algorithm consists of two steps: first 
recovering a seed matrix, second performing li filtering on 
corresponding rows and columns of the data matrix. Below 
we provide details of these two steps. 



3.1 Seed Matrix Recovery 

Suppose that the target rank r is very small compared with 
the data size: r <C min(m, n). We first randomly sample an 
{srv) X {scv) submatrix from M, where Sr > 1 and 
Sc > 1 are the row and column oversampling rates, respec- 
tively. Then the submatrix of the underlying matrix Lq 
can be recovered by solving a small sized PCP problem: 



min IIL^ 



•A||S^ 



s.t. M' 



(5) 



e.g., using ADM ( Lin et al ( 2009 )), where A = 1/ A/max(s^r, 
By Theorem 1 . 1 in ( Candes et al ( 201 1| )), the seed matrix 
can be exactly recovered from with an overwhelm- 
ing probability when Sr and Sc increases. In fact, by that 
theorem Sr and Sc should be chosen at the scale of 0(ln^ r). 
For the experiments conducted in this paper, whose r's are 
very small, we simply choose Sc = Sr = 10. 



3.2 /i Filtering 

For ease of illustration, we assume that is the top left 
(srv) X (scv) submatrix of M. Then accordingly M, Lq 
and So can be partitioned into: 



M 















7 Lq — 




7 So — 





(6) 



Since rank(Lo) = rank(L^) = r, there must exist matrices 
Q and P, such that 



L"Q and = L^ 



(7) 



As Sq is sparse, so are and S^. Therefore, Q and P can 
be found by solving the following problems: 



min IIS" 
and 



s.t. M" = L"Q + S^ 



(8) 



3 The h Filtering Algorithm 

Given an observed data matrix M G R^^^, which is the 
sum of a low-rank matrix Lq and a sparse matrix Sq, PCP 



min||Sni^i, s.t. M" = P^L^ + S^ 



(9) 



respectively. The above two problems can be easily solved 
by ADM. 
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With Q and P computed, and are obtained as 
Again by rank(Lo) = rank(L*) = r, the generaUzed 
Ny Strom method ( [Wang et al| ( |2009| )) gives: 

where (L^)^ is the Moore-Penrose pseudo inverse of L^. 

In real computation, as the SVD of is readily avail- 
able when solving ([5]), due to the singular value thresholding 
operation (|4]), it is more convenient to reformulate ^ and 
([51) as 



Algorithm 1 Solving ([14]) by ADM 



and 

minllSni;, 

S-,P 



s.t. M" = U"Q + S^ 



s.t. = P^(V^)^ + S^ 



(11) 



(12) 



respectively, where U*5]*(V*)^ is the skinny SVD of 
obtained from ([4]) in the iterations. Such a reformulation has 
multiple advantages. First, as (U^)^U^ = (V^)^V^ = I, 
it is unnecessary to compute the inverse of (U^)^U* and 
(V^)^V^ when updating Q and P in the iterations of ADM. 
Second, computing ( [T0| ) also becomes easy if one wants to 
form explicitly because now 

= P^(5]")-^Q. (13) 

To make the algorithm description complete, we sketch 
in Algorithm [T] the ADM for solving ( (TTj ) and ( [T2| ), which 
are both of the following form: 



min||E||,, 



s.t. X = AZ + E, 



(14) 



where X and A are known matrices and A has orthonormal 



columns, i.e., A^A = I. The ADM for ( 14) is to minimize 



on the following augmented Lagrangian function 



|E||,, + (Y,X-AZ-E) 



X- AZ-EllI, (15) 



with respect to E and Z, respectively, by fixing other vari- 
ables, and then update the Lagrange multiplier Y and the 
penalty parameter f3^ 

Note that it is easy to see that ( [TT] ) and ( [T2| ) can also 
be solved in full parallelism as the columns and rows of 
and can computed independently, thanks to the decom- 
posability of the problems. So the recovery of and is 
very efficient if one has a parallel computing platform, such 
as a general purpose graphics processing unit (GPU). 

3.3 The Complete Algorithm 

Now we are able to summarize in Algorithm [2] our li filter- 
ing method for solving PCP, where steps 3 and 4 can be done 
in parallel. 

^ The ADM for solving PCP follows the same methodology. As a 
reader can refer to ( Lin et al| ( [2009) , |Yuan and Yang| { |2009 )) for details, 
we omit the pseudo code for using ADM to solve PCP. 



Input: X and A. 

Initialize: Set Eq, Zq and Yq to zero matrices. Set e > 0, p > 1 
and ^ > ^0 > 0. 

while ||X - AZfc - Efc||i^/||X||i^ > £ do 

Step 1: Update E^+i = S^-i (X - AZ^ + Yk//3k), where S 

is the soft- thresholding operator ( [Cai et al| ( |2010} ). 

Step 2: Update Z^+i = A^(X - E^+i + Yfe//3fe). 

Step 3: Update Y^+i = + /5fc(X - AZ^+i - E^+i) and 
= min{p/3k,f3). 
end while 



Algorithm 2 The h Filtering Method for Solving PCP ^ 

Input: Observed data matrix M. 

Step 1: Randomly sample a submatrix M^. 

Step 2: Solve the small sized PCP problem (|5}, e.g., by ADM, to 

recover the seed matrix . 

Step 3: Reconstruct by solving |T7) . 

Step 4: Reconstruct b y so lving |T2} . 

Step 5: Represent by {is). 

Output: Low-rank matrix L and sparse matrix S = M — L. 



3.4 Complexity Analysis 

Now we analyze the computational complexity of the pro- 
posed Algorithm [2] For the step of seed matrix recovery, the 
complexity of solving ^ is only O(r^). For the h filter- 
ing step, it can be seen that the complexity of solving ([TT]) 
and (12) is 0{r'^n) and 0{r'^m), respectively. So the total 
complexity of this step is 0{r'^{m + n)). As the remaining 
part iis of Lo can be represent ed by L^, and L ^, using the 
generalized Nystrom method ( Wang et al ( |20Q9t J^and recall 
that r <C min(m, n), we conclude that the overall complex- 



ity of Algorithm|2]is 0{r'^{m + n)), which is only of linear 
cost with respect to the data size. 



3.5 Exact Recoverability of h Filtering 

The exact recoverability of Lq using our li filtering method 
consists of two factors. First, exactly recovering from 
M*. Second, exactly recovering and L^. If all L*, L^, 
and can be exactly recovered, Lq is exactly recovered. 

The exact recoverability of from is guaranteed by 
Theorem 1.1 of ( |Candes et al| ( |20H"] )). When Sr and Sc are 
sufficiently large, the chance of success is overwhelming. 

To analyze the exact recoverability of and L^, we 
first observe that it is equivalent to the exact recoverability 
of and S^. By multiplying annihilation matrices U^'^ 
and V^'^ to both sides of (11) and (12), respectively, we 



^ Of course, if we explicitly form then this step costs no more 
than rmn complexity. Compared with other methods, our rest compu- 
tations are all of 0{r'^{m + n)) complexity at the most, while those 
methods all require at least 0(rmn) complexity in each iteration, 
which results from matrix-matrix multiplication. 
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may recover and by solving 

min ||S"||^,, s.t. U^'^M^ = U^'^S^ 

and 

min ||Snk, s.t. M^(V^'^)^ = S^(V^'^)^, 



(17) 



respectively. If the oversampling rates Sc and Sr are large 
enough, we are able to choose U*'^ and V*'^ that are close 
to Gaussian random matrices. Then we may apply the stan- 
dard theory in compressed sensing (tCandes and Wakin ( 2Q07| )) 
to conclude that if the oversampling rates Sc and Sr are large 
enough and and are sparse enouglj^ and can be 
exactly recovered with an overwhelming probability. 

We also present an example in Figure [T] to illustrate the 
exact recoverability of h filtering. We first truncate the SVD 
of an 1024 x 768 image "Water"|^to get a matrix of rank 30 
(Figure[T](b)). The observed image (Figure[T](a)) is obtained 
from Figure [T](b) by adding large noise to 30% of the pix- 
els uniformly sampled at random (Figure [T](c)). Suppose we 
have the top-left 300 x 300 submatrix as the seed (Figure [T] 
(e)), the low-rank image (Figure [T] (d)) can be exactly re- 
covered by li filtering. Actually, the relative reconstruction 
errors in L* is only 7.03 x 10~^. 



3.6 Target Rank Estimation 

The above analysis and computation are all based on a known 
value of the target rank r. For some applications, we could 
have an estimate on r. For example, for the background 
modeling problem (^ Wright et al| ( |2009 )), the rank of the 
background video should be very close to one as the back- 
ground hardly changes; and for the photometric stereo prob- 



lem ( |Wu et al| ( |2Q1Q| )) the rank of the surface normal map 
should be very close to three as the normals are three di- 
mensional vectors. However, the rank r of the underlying 
matrix might not always be known. So we have to provide a 
strategy to estimate r. 

As we assume that the size x of submatrix is 
(srv) X (scv), where Sr and Sc should be sufficiently large in 
order to ensure the exact recovery of from M*, after we 
have computed by solving ([5]), we may check whether 



m' /r' > Sr and n' /r' > Sc 



(18) 



are satisfied, where r' is the rank of L^. If yes, is ac- 
cepted as a seed matrix. Otherwise, it implies that m' x n' 

^ As the analysis in the compressed sensing theories is qualitative 
and the bounds are actually pessimistic, copying those inequalities 
here is not very useful. So we omit the mathematical descriptions for 
brevity. 



The image is available at http://www.petitcolas.net/fabien/ 



watermarking/image_database/ 



may be too small with respect to the target rank r. Then we 
may increase the size of the submatrix to {srv') x (scV^) and 



(16) repeat the above procedure until ([18]) is satisfied or 



max(mYm, nY^) > 0.5. 



(19) 



We require (T9\ because the speed advantage of our li fil- 
tering algorithm will quickly lost beyond this size limit (see 
Figure |2]). If we have to use a submstrix whose size should 
be greater than (0.5m) x (0.5n), then the target rank should 
be comparable to the size of data, hence breaking our low- 
rank assumption. In this case, we may resort to the usual 
method to solve PCP. 

Of course, we may sample one more submatrix to cross 
validate the estimated target rank r. When r is indeed very 
small, such a cross validation is not a big overhead. 

4 Experimental Results 

In this section, we present experiments on both synthetic 
data and real vision problems (structure from motion and 
background modeling) to test the performance of li filtering. 
All the experiments are conducted and timed on the same PC 
with an AMD Athlon® II X4 2.80GHz CPU that has 4 cores 
and 6GB memory, running Windows 7 and Matlab (Version 
7.10). 

4.1 Comparison Results for Solving PCP 

We first test the performance of h filtering on solving PCP 
problem ([2]). The experiments are categorized into the fol- 
lowing three classes: 



1. Compare with classic numerical solvers (e.g., ADM LinJ 
[etaI| ( |2QQ9l ) and its variation, denoted as LTSVD ADM, 
which uses linear-time SVD |Drineas et al|(20Q6| ) to solve 
the partial SVD in each iteration) on randomly generated 
low-rank and sparse matrices. 

2. Compare with factorization based solver (e.g., LMaFit 



Shen et al ( 2Q11| ) on recovering either randomly gener- 



ated or deterministic low-rank matrix from its sum with 
a random sparse matrix. 
3. Compare with random projection based solver (e.g., ran- 
dom projection'Mu et al ( 2011| )) on recovering randomly 
generated low-rank and sparse matrices. 

In the experiments synthetic data, we generate random 
test data in the following way: an m x m observed data ma- 
trix M is synthesized as the sum of a low-rank matrix Lq and 
a sparse matrix Sq. The rank r matrix Lq is generated as a 
product of two mxr matrices whose entries are i.i.d. Gaus- 
sian random variables with zero mean and unit variance. The 
matrix So is generated as a sparse matrix whose support is 
chosen uniformly at random, and whose p non-zero entries 
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are i.i.d. uniformly in [—500, 500]. The rank ratio and spar- 
sity ratio are denoted as = r/m and ps = p/m?, respec- 
tively. 

4.7.7 li Filtering vs. Classic Convex Optimization 

Firstly, we compare our approach with ADM on the whole 
matri:?!^ which we call the standard ADM, and its variation, 
which uses linear-time SVD (LTSVDf]for solving the par- 
tial SVD, hence we call the LTSVD ADM. We choose these 
two approaches because the standard ADM is known to be 
the most efficient classic convex optimization algorithm to 
solve PCP exactly and LTSVD ADM has a linear time cost 
in solving SVE[^ For LTSVD ADM, in each time to com- 
pute the partial SVD we uniformly oversample 5r columns 
of the data matrix without replacement. Such an oversam- 
pling rate is important for ensuring the numerical accuracy 
of LTSVD ADM at high probability. For all methods in com- 
parison, the stopping criterion is||M-L*-S*||F/||M||F < 
10-^. 

Table [T] shows the detailed comparison among the three 
methods, where RelErr = ||L* — Lo||f/||Lo||f is the rela- 
tive error to the true low-rank matrix Lq. It is easy to see that 
our li filtering approach has the highest numerical accuracy 
and is also much faster than the standard ADM and LTSVD 
ADM. Although LTSVD ADM is faster than the standard 
ADM, its numerical accuracy is the lowest among the three 
methods because it is probabilistic. 

We also present in Figure |2] the CPU times of the three 
methods when the rank ratio pr and sparsity ratio ps in- 
creases, respectively. The observed matrices are generated 
using the following parameter settings: m = 1000, vary 
Pr from 0.005 to 0.05 with fixed ps = 0.02 and vary ps 
from 0.02 to 0.2 with fixed pr = 0.005. It can be seen 
from Figure [2] (a) that LTSVD ADM is faster than the stan- 
dard ADM when pr < 0.04. However, the computing time 
of LTSVD ADM grows quickly when pr increases. It even 
becomes slower than the standard ADM when pr > 0.04. 
This is because LTSVD cannot guarantee the accuracy of 
partial SVD in each iteration. So its number of iterations is 
larger than that of the standard ADM. In comparison, the 
time cost of our li filtering method is much less than the 
other two methods for all the rank ratios. However, when 
Pr further grows the advantage of h filtering will be lost 
quickly, because h filtering has to compute the PCP on the 
(srv) X (scv) = (lOr) X (lOr) submatrix M^. In contrast. 



Figure [2] (b) indicates that the CPU time of these methods 
grows very slowly with respect to the sparsity ratio. 



^ The Matlab code of ADM is provided by the authors of i Lin et al 
( [2009} ) and all the parameters in this code are set to their default val- 
ues. 

^ The Matlab code of linear-time SVD is available in the FPCA 
package at http://www.columbia.edu/~sm2756/FPCA.htm 

^ However, LTSVD ADM is still of Oirmn) complexity as it in- 
volves matrix-matrix multiplication in each iteration. See also Sec- 
tion[2] 




Fig. 2 Performance of the standard ADM (S-ADM for short), LTSVD 
ADM (L-ADM for short) and li filtering under different rank ratios pr 
and sparsity ratios ps, where the matrix size is 1000 x 1000. The x-axis 
represents the rank ratio (a) or sparsity ratio (b). The y-axis represents 
the CPU time (in seconds). 



4.7.2 li Filtering vs. Factorization Method 

We then compare the proposed h filtering with a factor- 
ization method (i.e., LMaFij^ on solving To test the 
ability of these algorithms in coping with corruptions with 
large magnitude, we multiply a scale a to the sparse matrix, 
i.e., M = Lo + ctSq. We fix other parameters of the data 
(m = 1000, r = 0.01m and ps = 0.01) and vary the scale 
parameter a from 1 to 10 to increase the magnitude of the 
sparse errors. 

The computational comparisons are presented in Fig- 
urejS] Besides the CPU time and relative error, we also mea- 
sure the quality of the recovered L* by its maximum dif- 
ference (MaxDif) and average difference (AveDif) to the 
true low-rank matrix Lq, which are respectively defined as 
MaxDif = max(|L* - Lo|) and AveDif = {^.. |L* - 
Lq I ) /m^ . One can see that the performance of LMaFit dra- 
matically decreases when a > 3. This experiment suggests 
that the factorization method fails when the sparse matrix 
dominates the low-rank one in magnitude. This is because 
a sparse matrix with large magnitudes makes rank estima- 
tion difficult or impossible for LMaFit. Without a correct 
rank, the low-rank matrix cannot be recovered exactly. In 
comparison, our li filtering always performs well on the test 
data. 

In the following, we consider the problem of recovering 
deterministic low-rank matrix from corruptions. We gener- 
ate an m X m "checkerboard" image (see Figure]?]), whose 
rank is 2, and corrupt it by adding 10% impulsive noise to 
it. The corruptions (nonzero entries of the sparse matrix) 
are sampled uniformly at random. The image size m ranges 
from 1000 to 5000 with an increment 500. 



^ The Matiab code of LMaFit is provided by the authors of |Shen 
|et al| ( |20rip ) and all the parameters in this code are set to their default 
values. 
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Table 1 Comparison among the standard ADM (S-ADM for short), LTSVD ADM (L-ADM for short) and h filtering method (h for short) on the 
synthetic data. We present CPU time (in seconds) and the numerical accuracy of tested algorithms. Lq and So are the ground truth and L* and S* 
are the solution computed by different methods. For the h filtering method, we report its computation time sls t = ti -\- 12, where t,ti and t2 are 
the time for total computation, seed matrix recovery and filtering, respectively. 



Size (m) 


Method 


RelErr 


rank(L*) 


l|L*||* 


l|S*||^o 


l|s*lk 


Time 






rank(Lo) = 


20, 


||Lo II* 


= 39546, 


llSolk = 40000, llSol 


998105 


2000 


S-ADM 
L-ADM 

h 


1.46 xlO-« 
4.72 xlQ-^ 
1.66 xlQ-s 


20 
20 
20 


39546 
39546 
39546 


39998 
40229 
40000 


998105 
998105 
998105 


84.73 
27.41 
5.56 = 2.24 + 3.32 






rank(Lo) = 50, 


|Lo||* = 


249432, 


\So\\i^ = 250000, ||So| 


= 6246093 


5000 


S-ADM 
L-ADM 

h 


7.13 xlO-y 
4.28 xlO-^ 
5.07 xlO-9 


50 
50 
50 


249432 
249432 
249432 


249995 
250636 
250000 


6246093 
6246158 
6246093 


1093.96 
195.79 
42.34=19.66 + 22.68 






rank(Lo) = 100, | 


Loll* = 


997153, 1 


Sollio = 1000000, ||So 


\\i^ = 25004070 


10000 


S-ADM 
L-ADM 

h 


1.23 xl0-« 
4.26 xlO-^ 
2.90 X 10-10 


100 
100 
100 


997153 
997153 
997153 


1000146 
1000744 
1000023 


25004071 
25005109 
25004071 


11258.51 
1301.83 
276.54 = 144.38 + 132.16 




Sparsity Magnitude (a) 

(a) 



Sparsity Magnitude (a) 

(b) 



Sparsity Magnitude (a) 

(c) 



Sparsity Magnitude (a) 

(d) 



Fig. 3 Performance of LMaFit and h filtering under different sparsity magnitudes {a e [1, 10]). The x-axes represent the sparsity magnitudes and 
the y-axes represent the CPU time (in seconds) (a), "RelErr" (b), "MaxDif" (c) and "AveDif" (d) in log scale, respectively. 





(a) Corrupted 



(b) LMaFit 



(c)/i 



(d) CPU Time 



Fig. 4 Recovery results for "checkerboard", (a) is the image corrupted by 10% impulsive noise, (b) is the image recovered by LMaFit. (c) is the 
image recovered by h filtering (/i). (d) CPU time (in seconds) vs. data size (m G [1000, 5000]). 



The results for this test are shown in Figure |4] where the 
first image is the corrupted checkerboard image, the second 
image is recovered by LMaFit and the third by li filtering. 
A more complete illustration for this test can be seen from 
Figure |4jd), where the CPU time corresponding to all tested 
data matrix sizes are plotted. It can be seen that the images 
recovered by LMaFit and li filtering are visually compara- 
ble in quality. The speeds of these two methods are very 
similar when the data size is small, while h filtering runs 
much faster than LMaFit when the matrix size increases. 
This concludes that our approach has significant speed ad- 
vantage over the factorization method on large scale data 
sets. 



4.1.3 1 1 Filtering vs. Compressed Optimization 

Now we compare h filtering with a compressed optimiza- 
tion method (i.e., random projectioif^. This experiment is 
to study the performance of these two methods with respect 
to the rank of the matrix and the data size. The parameters 
of the test matrices are set as follows: ps = 0.01, pr vary- 
ing from 0.05 to 0.15 with fixed m = 1000, and m varying 
from 1000 to 5000 with fixed pr = 0.05. For the dimension 



The Matlab code of random projection (RP) is provided by the 
author of ^Mu et al| ( |2011 ^) and all the parameters in this code are set 
to their default values. 
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of the projection matrix (i.e., p), we set it as p = 2r for all 
the experiments. 

As shown in Figure [5j in all cases the speed and the nu- 
merical accuracy of li filtering are always much higher than 
those of random projection. 




(c) 



(d) 



Fig. 5 Performance of random projection (RP for short) and fil- 
tering. (a)-(b) are the comparison under different rank ratios (pr G 
[0.05,0.15]). (c)-(d) are the comparison under different data sizes 
(m e [1000, 5000]). In (a) and (c), the y-axes are the CPU times (in 
seconds). In (b) and (d), the y-axes are the relative errors in log scale. 



4.2 Structure from Motion 



angles between and 27r, with a step size tt/IOOO, and uni- 
formly randomly generated translations) to the 3D "wolf" 
object which contains 4344 3D points. Then we add im- 
pulsive noises Sq (the locations of the nonzero entries are 
uniformly sampled at random) to part (e.g., 5% or 10%) of 
the feature points (see Figure |6]). In this way, we obtain cor- 
rupted observations M = Lq + So with a size 4002 x 4344. 

We apply our li filtering to remove outliers (i.e., Sq) and 
compute the affine motion matrix A and the 3D coordinates 
B from the recovered features (i.e., Lq). For comparison, 
we also incl ude the results from the robust subspace learn- 
ing (RSL) ( jDe la Torre and Black] (|2003 and standard 
PCP (i.e., S-ADM based PCP). In Figure [Tj we show the 
original 3D object, SfM results based on noisy trajectories 
and trajectories recovered by RSL, standard PCP and h fil- 
tering, respectively. It is easy to see that the 3D reconstruc- 
tion of RSL fails near the front legs and tail. In contrast, 
the standard PCP and li filtering provide results with almost 
the same quality. Table [2] further compares the numerical 
behaviors of these methods. We measure the quantitative 
performance for SfM by the well-known mean 2D repro- 
jection error, which is denoted as "ReprojErr" and defined 
by the mean distance of the ground truth 2D feature points 
and their reprojections. We can see that the standard PCP 
provides the highest numerical accuracy while its time cost 
is extremely high (9 times slower than RSL and more than 
100 times slower than li filtering). Although the speed of 
RSL is faster than standard PCP, its numerical accuracy is 
the worst among these methods. In comparison, our li fil- 
tering achieves almost the same numerical accuracy as stan- 
dard PCP and is the fastest. 



In this subsection, we apply h filtering to a real world vi- 
sion application, namely structure from motion (SfM). The 
problem of SfM is to automatically recover the 3D struc- 
ture of an object from a sequence of images of the object. 
Suppose that the object is rigid, there are F frames and P 

Tx' 

tracked feature points (i.e., Lq = ^ ), and the cam- 

2FxP 



era intrinsic parameters do not change. As shown in ( |Rao 
et allpOlO] )), the trajectories of feature points from a single 
rigid motion of the camera all lie in a liner subspace of M?^ , 
whose dimension is at most four (i.e., rank(Lo) < 4). It has 
been shown that Lq can be factorized as Lq = AB, where 
A G R^^^^ recovers the rotations and translations while 
the first three rows of B G R^^^ encode the relative 3D 
positions for each feature point in the reconstructed object. 
However, when there exist errors (e.g., occlusion, missing 
data or outliers) the feature matrix is no longer of rank 4. 
Then recovering the full 3D structure of the object can be 
posed as a low-rank matrix recovery problem. 

For this experiment, we first generate the 2D feature 
points Lq by applying an affine camera model (with rotation 



4.3 Background Modeling 

In this subsection, we consider the problem of background 
modeling from video surveillance. The background of a group 
of video surveillance frames are supposed to be exactly the 
same and the foreground on each frame is recognized as 
sparse errors. Thus this vision problem can be naturally for- 
mulated as recovering the low-rank matrix from its sum with 
sparse errors ( Candes et al pOll )). We compare our li fil- 
tering with the baseline median filtef^ and other state-of- 
the-art robust approaches, such as RSL and S-PCP. When 
median filtering using all the frames, the complexity is actu- 
ally also quadratic and the results are not good. So we only 
buffer 20 frames when using median filter to compute the 
background. For li filtering, we set the size of the seed ma- 
trix as 20 X 20. 



The 3D "wolf" data is available at: http://tosca.cs.technion.ac.il/] 
The Matlab code of RSL is available at 
| http://www.salleurl.edu/ ^ftorre/papers/rpca/rpca.zip and the pa- 
rameters in this code are set to their default values. 

Please refer to http://en.wikipedia.org/wiki/Median_filter 
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Fig. 6 The illustrations of some trajectories (2D image frames) generated by the 3D "wolf" object (300th, 500th, • • • , 1300th, 1500th frames). Top 
row: the ground truth trajectories. Bottom row: 10% corrupted trajectories. 




(a) Original (b) Corrupted (c) RSL (d) S-PCP (e) /i -filtering 



Fig. 7 The SfM reconstruction, (a) is the original 3D object, (b)-(e) are SfM results using corrupted trajectory and the trajectories recovered by 
RSL, standard PCP (S-PCP for short) and h filtering, respectively. 

Table 2 Comparison among RSL, S-PCP and filtering on the structure from motion problem. We present CPU time (in seconds) and the 
numerical accuracy of tested algorithms. Lq and So are the ground truth and L* and S* are the solution computed by different methods. 



Noisy Level 


Method RelErr rank(L*) 118*11^^ Time MaxDif(L*) AveDif(L*) ReprojErr 


5% 


rank(Lo) = 4, \\So\\i^ = 869234 


RSL 


0.0323 


4 (fixed) 


15384229 


93.05 


32.1731 


0.4777 


0.9851 


S-PCP 


5.18 X 10-y 


4 


869200 


848.09 


1.70 X 10-^ 


2.47 X 10-« 


4.18 X 10-« 


h 


1.16 X 10-« 


4 


869644 


6.46 


1.80 X 10-^ 


3.61 X 10- ^ 


4.73 X 10- ^ 


10% 


rank(Lo) = 4, \\So\\i^ = 1738469 


RSL 


0.0550 


4 (fixed) 


16383294 


106.65 


38.1621 


0.9285 


1.8979 


S-PCP 


6.30 X 10-y 


4 


1738410 


991.40 


1.57 X 10-^ 


4.09 X 10-« 


6.82 X 10- ^ 


^1 


3.18 X 10-« 


4 


1739912 


6.48 


5.61 X 10-^ 


9.03 X 10- ^ 


1.26 X 10-^ 



For quantitative evaluation, we perform all the compared 
methods on the "laboratory" sequence from a public surveil- 
lance database ( [Benedek and Sziranyi| ( |2QQ8| )) which has 
ground truth foreground. Both the false negative rate (FNR) 
and the false positive rate (FPR) are calculated in the sense 
of foreground detection. FNR indicates the ability of the 
method to correctly recover the foreground while the FPR 
represents the power of a method on distinguishing the back- 
ground. These two scores correspond to the Type I and Type 
II errors in the statistical test theor}p]and are judged by the 
criterion that the smaller the better. One can see from Table[3] 
that RSL has the lowest FNR but the highest FPR among the 
compared methods. This reveals that RSL could not exactly 
distinguish the background. Although the speed of our li fil- 
tering is slightly slower than median filtering on 20 frames, 



its performance is as good as S-PCP, which achieves the best 
results but with the highest time cost. 



Table 3 Comparison among median filter (Median for short), RSL, S- 
PCP, and h filtering on background modeling problem. "Resolution" 
and "No. Frames" denote the size of each frame and the number of 
frames in a video sequence, respectively. We present FNR, FPR and the 
CPU time (in seconds) for the "laboratory" data set. For our collected 
"meeting" data set, we only report the CPU time because there is no 
ground truth foreground for this video sequence. 



Video 


- 1 Median | RSL | S-PCP | h 


"laboratory" 


Resolution: 240 x 320, No. Frames: 887 


FNR 


9.85 


7.31 


8.61 


8.62 


FPR 


9.18 


10.83 


8.72 


8.76 


Time 


42.90 


3159.92 


10897.96 


48.99 


"meeting" 


Resolution: 576 x 720, No. Frames: 700 


Time 1 179.19 | N.A. | N.A. | 178.74 



Please refer to http://en.wikipedia.org/wiki/Type_Land_type_n_errors 
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To further test the performance of h filtering on large 
scale data set, we also collect a video sequence (named "meet- 
ing") of 700 frames, each of which has a resolution 576 x 
720. So the data matrix is of size greater than 700 x 400000, 
which cannot be fit into the memory of our PC. As a result, 
we cannot use the standard ADM to solve the corresponding 
PCP problem. As for RSL, we have found that it did not con- 
verge on this data. Thus we only compare the performance 
of median filter and li filtering. The time cost is reported 
in Table [3] and the qualitative comparison is shown in Fig- 
ure [8] We can see that h filtering is as fast as median filter- 
ing with 20 frames, and median filter fails on this data set. 
This is because the mechanism of median filter is based on 
the (local) frame difference. Thus when the scene contains 
slowly moving objects (such as that in the "meeting" video 
sequence), median filter will not give good results. In con- 
trast, the background and the foreground can be separated 
satisfactorily by li filtering. This makes sense because our 
li filtering can exactly recover the (global) low-rank struc- 
ture for the background and remove the foreground as sparse 
errors. 



5 Conclusion and Further Work 

In this paper, we propose the first linear time algorithm, 
named the h filtering method, for exactly solving very large 
PCP problems, whose ranks are supposed to be very small 
compared to the data size. It first recovers a seed matrix and 
then uses the seed matrix to filter some rows and columns of 
the data matrix. It avoids SVD on the original data matrix, 
and the h filtering step can be done in full parallelism. As a 
result, the time cost of our li filtering method is only linear 
with respect to the data size, making applications of RPCA 
to extremely large scale problems possible. The experiments 
on both synthetic and real world data demonstrate the high 
accuracy and efficiency of our method. It is possible that 
the proposed technique can be applied to other large scale 
nuclear norm minimization problems, e.g., matrix comple- 
tion ( |Cai et al ( 2010 )) and low-rank representation ( |Liu et al| 
( |2010| )). This will be our future work. 
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(a) Original (b) Background (Median) (c) Foreground (Median) (d) Background (/i) (e) Foreground (/i) 



Fig. 8 The partial background modeling results of median filter and h filtering on the "meeting" video sequence, (b)-(c) and (d)-(e) are the the 
background (L*) and the foreground (S*) recovered by median filter and h filtering, respectively. 
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